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(54) Method and system for establishing a quorum for a geographically distributed cluster of 
computers 



(57) One embodiment of the present invention pro- 
vides a system that facilitates establishing a quorum for 
a cluster of computers that are geographically distribut- 
ed. The system operates by detecting a change in mem- 
bership of the cluster. Upon detecting the change, the 
system forms a potential new cluster by attempting to 
communicate with all other computers within the cluster. 



The system accumulates votes for each computer suc- 
cessfully contacted. The system also attempts to gain 
control of a quorum server located at a site separate 
from all computers within the cluster. If successful at 
gaining control, the system accumulates the quorum 
server's votes as well. If the total of accumulated votes 
is a majority of the available votes, the system forms a 
new cluster from the potential new cluster. 
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ter with nodes that are widely separated, by potentially 
thousands of miles, in order to provide reliability in the 
event of a local disaster. This separation poses prob- 
lems for the quorum configuration. If the quorum device 
is located with either node, a disaster at that site could 
destroy both the node and the quorum device, effective- 
ly preventing the other node from taking control. In ad- 
dition, connecting a quorum device such as a SCSI disk 
over these long distances can be extremely expensive 
or impossible. 

[0014] Accordingly, one embodiment of the present 
invention provides a system that facilitates establishing 
a quorum for a cluster of computers that are geograph- 
ically distributed. The system operates by detecting a 
change in membership of the cluster. Upon detecting the 
change, the system forms a potential new cluster by at- 
temptingto communicate with all other computers within 
the cluster. The system accumulates votes for each 
computer successfully contacted. The system also at- 
tempts to gain control of a quorum server located at a 
site separate from ail computers within the cluster. If 
successful at gaining control, the system accumulates 
the quorum server's vote or votes as well. If the total of 
accumulated votes comprises a majority of the available 
votes, the system forms a new cluster from the potential 
new cluster. 

[0015] In one embodiment of the present invention, 
the system exchanges heartbeat messages with all oth- 
er computers that are part of the cluster. Upon discov- 
ering an absence of heartbeat messages from any com- 
puter in the cluster, the system initiates a cluster mem- 
bership protocol. 

[0016] In one embodiment of the present invention, 
detecting the change in cluster membership includes 
detecting that the cluster has not been formed. 
[0017] In one embodiment of the present invention, 
attempting to gain control of the quorum server involves 
communicating with the quorum server using crypto- 
graphic techniques. 

[0018] In one embodiment of the present invention, 
the system exchanges a status message with each 
member of the new cluster. The system updates the lo- 
cal status of the computer to the most recent status 
available within the status messages. 
[0019] Another embodiment of the present invention 
provides a system that facilitates establishing a quorum 
for a cluster of computers that are geographically dis- 
tributed. The system provides a quorum server at a site 
separate from a location of any computer within the clus- 
ter. The system assigns at least one vote to each com- 
puter within the cluster. The system also assigns at least 
one vote to the quorum server. In operation, the system 
attempts to establish communications between each 
pair of computers within the cluster. A count of votes is 
accumulated at each computer for each computer that 
responds. The system also attempts to establish control 
over the quorum server from each computer within the 
cluster. If control is established over the quorum server, 
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the quorum server's vote(s) are accumulated in the 
count of votes. The system establishes a quorum when 
a majority of available votes has been accumulated in 
the count of votes. 

5 [0020] In one embodiment of the present invention, 
the quorum. server grants control to only a firsfeeomputer 
attempting to establish control. Another approach is for 
the quorum server to grant control to only one computer 
out of all the computers attempting to establish control 

10 based on a pre-established priority list. 

[0021] In one embodiment of the present invention, 
votes are assigned so that the quorum includes at least 
one computer that was in an immediately previous clus- 
ter. This ensures that a cluster formed from the quorum 

15 has current data. 

[0022] In one embodiment of the present invention, 
attempting to establish control over the quorum server 
involves establishing communications with the quorum 
server. Note that cryptographic techniques may be em- 

20 ployed here to deter attacks. 

[0023] Various embodiments in accordance with the 
invention will now be described in detail by way of ex- 
ample only, with reference to the following drawings: 

25 FIG. 1 illustrates a geographically distributed clus- 
ter of computers coupled together in accordance 
with one embodiment of the present invention. 
FIG. 2 is a flowchart illustrating the process of de- 
tecting and processing a failure within a cluster in 
30 accordance with one embodiment of the present in- 
vention. 

FIG. 3 is a flowchart illustrating the process of de- 
termining cluster membership, such as may be 
used in the process of FIG. 2. 
35 FIG. 4 is a flowchart illustrating the process of grant- 
ing control of a quorum server such as shown in 
FIG. 1 . 

FIG. 5 is a flowchart illustrating the process of 
reconfiguring a computer within a cluster, such as 
40 may be used in the process of FIG. 2. 

Computer Cluster 

[0024] FIG. 1 illustrates a geographically distributed 
45 cluster of computers coupled together in accordance 
with one embodiment of the present invention. Comput- 
ers 1 02 and 1 04 form a cluster of computers that operate 
in concert to provide services and data to users. Two or 
more computers are formed into a cluster to provide 
so speed and reliability for the users. Computers 1 02 and 
104 are located in geographic areas 120 and 122 re- 
spectively. Geographic areas 120 and 122 are widely 
separated, possibly by thousands of miles, in order to 
provide survivability for the cluster in case of a local dis- 
ss aster at geographic area 1 20 or 122. For example, ge- 
ographic area 120 may be located in California, while 
geographic area 122 may be located in New York. 
[0025] Computers 1 02 and 1 04 can generally include 
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nodes of the membership of the new cluster (step 31 6). 
Note that the above steps are being accomplished by 
all computers in the system simultaneously. 

Controlling Quorum Server 

[0039] FIG. 4 is a flowchart illustrating the process of 
granting control of quorum server 106 in accordance 
with one embodiment of the present invention. The sys- 
tem starts when quorum server 106 receives a request 
for control from a node in the proposed new cluster (step 
402). Next, quorum server 106 determines if the re- 
questing node was on the list of nodes for the previous 
cluster (step 404). If the requesting node was not on the 
list of nodes for the previous cluster, quorum server 1 06 
determines if the list of nodes for the previous cluster is 
empty (step 406). Note that an empty list indicates that 
a cluster had never been formed and this request is part 
of initializing a cluster for the first time. If the cluster list 
is not empty at 406 s quorum server 106 denies the re- 
quest to control quorum server 106 (step 408). 
[0040] If the node was on the previous cluster list at 
404 or if the cluster list is empty at 406, quorum server 
106 sets the cluster list to contain only the requesting 
node (step 41 0). (It will be apparent to a person of ordi- 
nary skill in the art that there are other ways to reset the 
list, including receiving a list of nodes from the request- 
ing node to include in the list, or receiving a list of nodes 
from the requesting node to exclude from the list). Final- 
ly, quo rum server 106 affirms the request to control quo- 
rum server 1 06 and grants its vote(s) to the requesting 
node (step 412). 

Reconfiguring a Computer 

[0041] FIG. 5 is a flowchart illustrating the process of 
reconfiguring a computer within a cluster in accordance 
with one embodiment of the present invention. The sys- 
tem starts when a computer, say computer 102, re- 
ceives status data from other nodes in the new cluster 
(step 502). Next, computer 102 determines which set of 
status data is the most recent (step 504). 
[0042] Computer 1 02 updates its own internal status 
to conform with the most recent status data available 
(step 506). Finally, computer 102 informs quorum server 
106 which nodes to include in the new cluster list (step 
508). 

[0043] The data structures and code described herein 
for implementing the establishment of a quorum are typ- 
ically stored on a computer readable storage medium, 
which may be any device or medium that can store code 
and/or data for use by a computer system. This includes , 
but is not limited to, magnetic and optical storage devic- 
es such as disk drives, magnetic tape, CDs (compact 
discs) and DVDs (digital versatile discs or digital video 
discs), and computer instruction signals embodied in a 
transmission medium (with or without a carrier wave up- 
on which the signals are modulated). For example, the 
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transmission medium may include a communications 
network, such as the Internet. 

[0044] The foregoing description of various embodi- 
ments of the present invention has been provided in the 

5 context of a particular application, and for the purpose 
of illustration only. Many other modifications and varia- 
tions will be apparent to practitioners skilled in the art, 
and so the scope of the present invention is not limited 
to the particular embodiments shown, but rather is de- 

10 fined by the appended claims and equivalents thereof. 



Claims 

15 1. A method for facilitating the establishment of a quo- 
rum for a cluster within a plurality of computers that 
are geographically distributed, the method compris- 
ing the steps of: 

20 detecting a change in membership of the clus- 

ter at a computer within the plurality of comput- 
ers; and 

upon detecting the change in membership, 
forming a potential new cluster by attempting 

25 to communicate with 

all other computers within the plurality of com- 
puters, accumulating votes for each computer 
successfully contacted, 
attempting to gain control of a quorum server 

30 located at a site separate from all computers 

within the plurality of computers, 
if successful, accumulating the quorum serv- 
er's votes, and 

if the total of accumulated votes represents a 
35 majority of the available votes, forming a new 

cluster from the potential new cluster. 

2. The method of claim 1 , wherein the step of detecting 
a change in membership includes the steps of: 

40 

exchanging heartbeat messages with all com- 
puters that are part of the cluster; and 
upon discovering an absence of a heartbeat 
message from any computer in. the cluster, ini- 
45 tiating a cluster membership protocol. 

3. The method of claim 1 , wherein the step of detecting 
the change in cluster membership includes detect- 
ing that the cluster has not been formed. 

50 

4. The method of any preceding claim, wherein the 
step of attempting to gain control of the quorum 
server includes communicating with the quorum 
server using cryptographic techniques. 

55 

5. The method of any preceding claim, further com- 
prising the steps of: 
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15. A system to facilitate establishing a quorum for a 
cluster within a plurality of computers that are geo- 
graphically distributed, wherein the plurality of com- 
puters are coupled together by a network, the sys- 
tem comprising: 5 

a quorum server located at a site separate from 
any one computer of the plurality of computers; 
and 

an independent communications link for cou- 10 
pling each computer of the plurality of comput- 
ers to the quorum server. 

16. The system of claim 1 5, wherein the quorum server 
includes a mechanism for granting control to only *5 
one computer of the plurality of computers request- 
ing control. 

17. The system of claim 15, wherein the quorum server 
includes a mechanism for maintaining a list of com- 20 
puters accepted into the cluster. 

18. The system of any of claims 15 to 17, wherein the 
quorum server includes a mechanism for crypto- 
graphically ensuring an identity of a computer at- 25 
tempting to establish control. 

19. The system of any of claims 15 to 18, wherein the 
quorum server includes monitoring means to mon- 
itor the status of each computer within the plurality 30 
of computers. 
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The system accumulates votes for each computer suc- 
cessfully contacted. The system also attempts to gain 
control of a quorum server located at a site separate 
from all computers within the cluster. If successful at 
gaining control, the system accumulates the quorum 
server's votes as well. If the total of accumulated votes 
is a majority of the available votes, the system forms a 
new cluster from the potential new cluster. 
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